The rapid proliferation of fake news across multilingual digital platforms poses a significant challenge for information reliability and societal trust. Existing detection approaches often focus on monolingual datasets or fail to integrate robust feature selection with context-aware embeddings, limiting their scalability and effectiveness. This study proposes a novel multilingual fake-news detection framework that combines translation-driven label alignment, dense context-aware embeddings via Sentence-BERT (SBERT), and Genetic Algorithm-based feature selection, followed by evaluation using multiple ensemble and traditional classifiers. The framework is validated on English and Bengali datasets, where Bengali news is translated to English and labels are generated through cosine similarity with the English dataset. By extracting semantic-rich embeddings and optimizing feature subsets, the framework effectively reduces dimensionality while retaining discriminative features, enabling enhanced model performance. Experimental results demonstrate that ensemble models, particularly Gradient Boosting and Random Forest, consistently achieve superior accuracy and robustness across languages, with the framework outperforming traditional monolingual and non-optimized approaches. The proposed pipeline addresses the gaps of multilingual alignment, optimization-driven feature selection, and ensemble evaluation in a unified architecture, offering a scalable, language-independent, and interpretable solution for fake-news detection. These findings highlight the potential of integrating cross-lingual semantic understanding and evolutionary optimization for reliable detection of misinformation in diverse linguistic contexts, providing a foundation for future research in low-resource and multilingual settings.
Introduction
The rise of online news has accelerated the spread of fake news, posing challenges to public trust and decision-making. Conventional detection systems struggle with multilingual content, evolving linguistic patterns, and high-dimensional contextual representations. Existing methods often focus on high-resource languages, lack robust feature-selection mechanisms, and fail to generalize across diverse datasets.
Proposed Framework:
This study introduces a multilingual fake news detection framework integrating:
Translation and label alignment for English and Bengali datasets, ensuring linguistic consistency.
Context-aware embeddings using Sentence-BERT to capture semantic relationships.
Genetic Algorithm-based feature selection to retain the most discriminative features.
The framework uses ensemble classifiers (Logistic Regression, Random Forest, Gradient Boosting, SVM) to assess performance across languages. Experimental results show that combining context-aware embeddings with optimized feature selection improves accuracy, efficiency, and robustness in multilingual settings.
Contributions:
Addresses multilingual fake news detection with translation and label alignment.
Leverages context-aware embeddings to capture cross-lingual semantics.
Applies Genetic Algorithm optimization to enhance classification performance.
Significance:
The proposed model offers a scalable, language-independent, and robust solution for fake news detection, extending applicability beyond monolingual, high-resource environments while efficiently handling semantic and contextual complexities.
Conclusion
The experimental results indicate that the proposed multilingual fake news detection framework performs effectively across both English and Bengali datasets. For the English dataset (Table 1), Gradient Boosting achieved the highest accuracy of 91.40%, demonstrating strong performance in handling high-dimensional contextual features. Random Forest and Logistic Regression also performed competitively, with accuracies of 88.34% and 88.44%, respectively, indicating that ensemble methods and traditional classifiers are both capable of capturing relevant patterns from context-rich embeddings. On the Bengali dataset (Table 2), Random Forest outperformed other classifiers with an accuracy of 79.32%, while Gradient Boosting, Logistic Regression, and SVM achieved accuracies of 78.06%, 77.19%, and 76.40%, respectively. Although the performance on the Bengali dataset is slightly lower than that on the English dataset, the results still show consistent effectiveness, suggesting that the translation and label alignment process successfully enabled cross-lingual classification. Overall, the findings demonstrate that integrating context-aware embeddings from Sentence-BERT with Genetic Algorithm-based feature selection enhances model performance in multilingual settings. Ensemble-based methods, particularly Gradient Boosting and Random Forest, consistently achieved superior results, highlighting their robustness and suitability for fake news detection. The framework proves to be effective for detecting fake news across languages, confirming its potential as a reliable, language-independent solution.
References
[1] Alarfaj, F.?K., et?al. (2023). Deep Dive into Fake News Detection: Feature Centric Approach. Algorithms,?16(11),?507. https://doi.org/10.3390/a16110507
[2] Al Tarawneh, M.?A., Al Irr, O., Al Maaitah, K.?S., Kanj, H., Aly, W.?H., & F… (2025). Towards Accurate Fake News Detection: Evaluating Ensemble Methods and Feature Selection. Eur. J. Pure Appl. Math.,?18(2),?6087. https://doi.org/10.29020/nybg.ejpam.v18i2.6087
[3] Wang, Xinyu; Zhang, Wenbo; Rajtmajer, Sarah (2024). Monolingual and Multilingual Misinformation Detection for Low Resource Languages: A Comprehensive Survey.
[4] Ilyas, M.?A., et?al. (2024). Fake News Detection on Social Media Using Ensemble Classifier Combination. Information Processing & Management.
[5] Dementieva, D., Kuimov, M., & Panchenko, A. (2023). Multiverse: Multilingual Evidence for Fake News Detection. J. Imaging, 9(4), 77. https://doi.org/10.3390/jimaging9040077
[6] Bala, A., & Krishnamurthy, P. (2023). Mul-FaD: Attention-based detection of multiLingual fake news. Proc. Third Workshop on Speech and Language Technologies for Dravidian Languages, 235–238. https://doi.org/10.18653/v1/2023.dravidianlangtech-1.34
[7] Shen, X., Huang, M., Hu, Z., Cai, S., & Zhou, T. (2024). Multimodal Fake News Detection with Contrastive Learning and Optimal Transport. Frontiers in Computer Science, 6:1473457. https://doi.org/10.3389/fcomp.2024.1473457
[8] LekshmiAmmal, H.R., & Madasamy, A.K. (2025). Explainable multimodal fake news detection for low resource languages using transformers. J. Big Data, 12:46. https://doi.org/10.1186/s40537-025-01093-x
[9] ?ncir, R., Ya?ano?lu, M., & Bozkurt, F. (2024). Genetic algorithm-based feature selection in fake news detection. Gümü?hane Univ. J. Sci. Technol., 14(3), 764-776. https://doi.org/10.17714/gumusfenbil.1396652
[10] Mishima, K., & Yamana, H. (2022). A Survey on Explainable Fake News Detection. IEICE Trans. Inf. Syst., E105.D(7), 1249-1257. https://doi.org/10.1587/transinf.2021EDR0003
[11] Jain, M.K., Gopalani, D., & Meena, Y.K. (2025). Hybrid CNN-BiLSTM model with HHO feature selection for enhanced fake news detection. Soc. Netw. Anal. Min., 15, 43. https://doi.org/10.1007/s13278-025-01455-6
[12] Saadi, A., Belhadef, H., Guessas, A., & Hafirassou, O. (2025). Enhancing Fake News Detection with Transformer Models and Summarization. Eng. Technol. Appl. Sci. Res., 15(3), 23253-23259. https://doi.org/10.48084/etasr.10678
[13] Rout, J., Mishra, M., & Saikia, M.J. (2025). Enhanced Attention-Based Transformer Model for Fake News Detection. J. Cybersecur. Priv., 5(3), 43. https://doi.org/10.3390/jcp5030043
[14] Yuan, L., Shen, H., Shi, L., Cheng, N., & Jiang, H. (2023). Explainable Fake News Analysis with Stance Information. Electronics, 12(15), 3367. https://doi.org/10.3390/electronics12153367
[15] Al-Tarawneh, M.A.B., Al-Khresheh, A., et al. (2025). Evaluating Machine Learning Approaches and Feature Selection Strategies. Eur. J. Pure Appl. Math., 18(2), 6087. https://doi.org/10.29020/nybg.ejpam.v18i2.6087
[16] Kumar, P., & Shrivastava, A. (2025). Efficient Classification Models for Fake News Detection. Int. J. Sci. Inno. Eng., 2(9). https://doi.org/10.70849/IJSCI
[17] Aljohani, E. (2024). Enhancing Arabic Fake News Detection with Data Balancing. Eng. Technol. Appl. Sci. Res., 14(4), 15947-15956. https://doi.org/10.48084/etasr.8019
[18] Men, X., & Mariano, V.Y. (2024). Explainable Fake News Detection Based on BERT and SHAP Applied to COVID-19. Int. J. Mod. Educ. Comp. Sci., 16(1), 11-22. https://doi.org/10.5815/ijmecs.2024.01.02.
[19] Bichi, A.S., Ahmad, I.S., et al. (2025). Lexicon–Sentiment-Based Model for Detecting Fake News. Artificial Intelligence and Applications. https://doi.org/10.47852/bonviewAIA52023972.
[20] https://www.kaggle.com/datasets/emineyetm/fake-news-detection-datasets.